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Background 



Patient self-reported symptoms are of crucial impor- 
tance to identify anxiety disorders, as well as to moni- 
tor their treatment in clinical practice and research. 
Thus, for evidence-based medicine, a precise, reliable, 
and valid (ie, "objective") assessment of the patient's 
reported "subjective" symptoms is warranted. There is 
a plethora of instruments available, which can provide 
psychometrically sound assessments of anxiety, but 
there are several limitations of current tools that need 
to be carefully considered for their successful use. Nev- 
ertheless, the empirical assessment of mental health 
status is not as accepted in medicine as is the assess- 
ment of biomarkers. One reason for this may be that 
different instruments assessing the same psychological 
construct use different scales. In this paper we present 
some new developments that promise to provide one 
common metric for the assessment of anxiety, to fa- 
cilitate the general acceptance of mental health assess- 
ments in the future. 
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ymptoms of anxiety are part of everyone's life. 
They have essential signal functions helping us to ma- 
neuver through our daily challenges. However, when 
anxiety is experienced without adequate stimuli, it may 
become a seriously burdening condition. In fact, the 
prevalence of anxiety disorders has been constantly ris- 
ing over the past decades, becoming the seventh most 
burdensome condition of all diseases worldwide today.^ 
As with other mental disorders, patients' self-re- 
ported symptoms are of crucial importance to diagnose 
anxiety disorders, as well as to monitor treatment suc- 
cess. For evidence-based medicine, a precise, reliable, 
and valid (ie, "objective") assessment of the patient's 
reported "subjective" symptoms is essential. In this pa- 
per we will focus on the state-of-the-art tools for assess- 
ing patients' self-reported symptoms, often also called 
patient-reported outcomes (PROs). We favor the term 
"symptoms" over "outcomes," as it includes the assess- 
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ment of psychological constructs as outcomes, as out- 
come predictors, or to screen for anxiety disorders. 

Measurement of self-reported 
symptoms of anxiety 

The empirical assessment of anxiety imposes a number 
of conceptual and methodological issues which need to 
be addressed before the use of a particular instrument 
can be considered. 

Conceptual issues 

Types of anxiety disorders 

Currently, at least four main subtypes of anxiety are 
usually distinguished within scientific publications, in- 
cluding general anxiety disorder (GAD), phobic disor- 
ders, panic disorders, and post-traumatic stress disorder 
(PTSD). All present with a different symptomatology, 
and different ethological conditions are assumed. From 
a measurement perspective, the first step of selecting an 
appropriate tool is to decide whether the assessment 
shall focus on more generic symptoms of anxiety, which 
are existent in several anxiety disorders, or more spe- 
cific symptoms of one particular disorder. 

Dimensionality of the construct 

Instruments assessing emotional distress often sample 
items from different domains (eg, mood, cognition, 
behavior, and somatic symptoms) to capture a com- 
prehensive set of manifest indicators of the underlying 
latent construct. Empirical studies have demonstrated 
that emotional distress can be described with a "tri- 
partite" model, distinguishing three principal compo- 
nents: general distress, physiological hyperarousal, and 
anhedonia.^ General distress is usually present in both 
depressive and anxiety disorders, while symptoms of 
anhedonia are more characteristic of depressive disor- 
ders, and symptoms of hyperarousal are more specific 
to anxiety disorders, in particular panic disorders and 
PTSD. 

Accordingly, instruments developed for assessing 
panic disorders more often include somatic symptoms 
(eg, palpitations, sweating, and dyspnea) than tools pri- 
marily used for GAD or the global assessment of anxi- 
ety, which usually focus on the assessment of moods. 



cognitions, or behaviors (eg, tension, nervousness, con- 
cerns, and inability to relax). 

Screening for anxiety disorders 

Screening for mental disorders has been frequently recom- 
mended to identify comorbid mental disorders in chronic 
medical conditions, such as coronary heart disease^ ' or 
diabetes mellitus.'"' In fact, in primary care practice only 
about half of the patients having a depressive disorder 
are identified.^""^^ Self-reported screening tools can help 
the health care provider in busy daily routines to identify 
those patients, with minimal additional effort.^^ 

However, the fundamental challenge of all estab- 
lished screening tools is that the measurement of a "di- 
mensional" construct must support a "categorical" di- 
agnostic decision. Thus for a natural phenomenon, like 
depressed mood or anxiety, a cutoff value needs to be 
defined above which a certain condition is likely to be 
classified as a pathology, according to consensus docu- 
ments like the Diagnostic and Statistical Manual of Men- 
tal Disorders (DSM) or International Classification of 
Diseases (I CD). 

Well-validated screening tools for depressive disor- 
ders (eg. Patient Health Questionnaire-9 [PHQ-9]) usu- 
ally provide good sensitivity (ie, the likelihood that pa- 
tients with depression are identified; >0.85) and specific- 
ity (ie, the likelihood that the ones being identified are in 
fact suffering from the disorder; SiCSS).^"* Screening tools 
for anxiety disorders typically provide less favorable re- 
sults, at least in clinical populations. One reason is that 
different types of anxiety disorders have more heteroge- 
neous symptoms than different types of depressive dis- 
orders. Another reason is that normal anxiety reactions 
of clinical samples typically show a greater overlap with 
anxiety symptoms expressed by patients being diagnosed 
as having an anxiety disorder (Figure 1). 

Monitoring of anxiety symptoms 

For the treatment of mental disorders an empirical as- 
sessment of the key symptoms is essential to monitor 
treatment success. Symptoms are usually measured as 
manifest "observable" variables (eg, "In the past seven 
days I worried about what could happen to me" (Pa- 
tient-Reported Outcomes Measurement Information 
System [PROMIS] Anxiety Item)^"" of an underlying 
"latent" construct (eg, anxiety). Within the majority 
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of established instruments, anxiety is assumed to be a 
state variable which can rapidly change over time. If 
longer recall periods are used (eg, "Indicate how much 
you have been bothered by... fear of dying... during 
the past month" (Beck Anxiety Inventory [BAI])" the 
manifest variable is assumed to measure a more stable 
aspect of the latent anxiety construct. Different recall 
periods may be appropriate depending on whatever 
treatment goal is defined. 

Polarity of the construct 

From a measurement perspective we must assume a di- 
mensional construct with lower or higher quantities. For 
most mental health constructs it has been extensively 
discussed whether the assumption of the construct be- 
ing "unipolar" is appropriate (ie, from no anxiety to high 



anxiety). This model is usually favored by clinicians us- 
ing a pathology model where anxiety is a symptom of a 
disease and no symptom of the disease would be healthi- 
ness. Another model, more often favored by epidemi- 
ologists, is to assume "bipolar" mental health states, ie, a 
continuum from high stress resilience through situation 
avoidance to extreme states of anxiety. The latter con- 
ceptualizes anxiety as a natural emotional phenomenon 
with different states of anxiety as responses to environ- 
mental challenges. This model has a larger measurement 
range and allows the assessment of different anxiety lev- 
els in the normal population as well. 

Measurement issues 

Today a wide range of well validated outcomes tools is 
available that can be used for monitoring anxiety disor- 
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Figure 1. Example of the relation between measurement precision, measurement range of five psychometric instruments, and the distribution 
of the latent traits depression and anxiety in a sample of 194 heart failure patients with and without a comorbid mental disorder (for 
details see ref 106). PHQ, Patient Health Questionnaire; HADS, Hospital Anxiety and Depression Scale; PROMIS, Patient-Reported Out- 
comes Measurement Information System 

Adapted from ref 1 06: Fischer HF, Klug C, Roeper K, et al. Screening for mental disorders in heart failure patients using computer-adaptive tests. Qual Life 
Res. 2014;23:1609-1618. Copyright © Springer Science + Business Media 2014 



199 



Clinical research 



ders. Most of these have been developed using Classical 
Test Theory (CTT) methods. Because self-assessment 
instruments have become increasingly important in the 
medical field, their limitations in measurement cover- 
age and precision are more intensely discussed and oth- 
er test development methods are gaining more inter- 
est. Within the next paragraphs, we will highlight some 
of the main restrictions of established instruments and 
some potential solutions to those limitations. 

Precision and respondent burden 

The most precise and comprehensive health assessment 
questionnaires are rather lengthy and complex, leading 
to a level of respondent burden that hampers their use 
in routine care and often leads to substantial problems 
of missing data. Therefore, tools which are popular to- 
day are relatively short questionnaires.^^ " They repre- 
sent a compromise in measurement precision, range, 
and other desirable attributes in favor of practicality.^" 
These short forms are useful for measuring the health 
status of large samples, but in small samples or when 
test scores of individual patients are evaluated, their re- 
duced precision causes concerns. 

Comparability 

Another major limitation of traditional tools has been 
that results from different questionnaires are difficult 
to compare, even when two similar instruments are 
used to assess the same outcome of the same disease, as 
every instrument uses its own metric. The heterogeneity 
of scale-specific metrics seriously impairs comparabil- 
ity across study results and complicates communication 
among researchers and clinicians. Pooling study results 
from different measures in quantitative reviews or 
meta-analyses is difficult and may even lead to biased 
results.^^'^^ The situation is as if body temperatures as- 
sessed in different settings were not comparable with 
one another, but were dependent on the particular ther- 
mometer used.^"'^' 

New generation of measurement tools 

To make the measurement of psychological constructs 
more similar to biomedical ones, a standardized, efficient 
approach for a variety of applications, including ambula- 
tory monitoring, clinical trials research, and population 



monitoring, should be established, so that results can be 
compared across conditions, therapies, trials, and patients. 

Item response theory (IRT) 

IRT provides a solution to many of the limitations of 
CTT. IRT methods were developed more than four dec- 
ades ago,^^'^' and numerous attempts have been made 
to exploit their potential.^^-^' Today, IRT-based tests are 
well established in the educational field,^^"''^ but have 
only recently been adopted in health care.^^"^^ 

Like factor analysis, IRT models assume that the 
measured construct is a latent variable, referred to as 
the IRT score (6), which cannot be observed directly, 
but can be estimated based on responses to different 
items measuring the same construct. An IRT item bank 
consists of items measuring the same construct and a 
mathematical description of the items' measurement 
properties.^'' The IRT modeP'-^* describes the probabil- 
ity of choosing each response to the item as a function 
of 9.-^''*'' Several different IRT models are used in health 
care applications, which have unique psychometric 
properties.^"'^' '*''''^ One important distinction of all IRT 
methods from CTT methods is that theta can be esti- 
mated from the responses to any subset of items in the 
bank.'' Accordingly, researchers or clinicians can select 
items that are most relevant for a given group or an in- 
dividual patient, and score the responses on one com- 
mon metric that is independent of the choice of items. 
If the item bank contains items from established ques- 
tionnaires, scores on these questionnaires can be pre- 
dicted from estimates of 9, even if the questionnaires 
themselves have not been administered ("equating").'*^ 
Thus, test scores of different questionnaires can easily 
be compared on one common metric.*' 

Computerized adaptive tests ( CATs) 

To use self-reported assessments efficiently a compre- 
hensive electronic data capturing system is warrant. Sev- 
eral providers offer their services for clinical research 
purposes, and the market for electronic health systems, 
which include patient self-assessments within the elec- 
tronic medical records (EMR) is currently evolving. Be- 
cause modern patient self-assessments usually use some 
kind of computer-assisted data collection method, the 
application of CATs is tempting. This new generation of 
PRO tools promises to provide very short and reliable 
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assessments.'"-^^*^ '"' The principle of a CAT is to select 
and administer only the most informative items for every 
individual patient, according to her or his estimated 9 
value from an IRT item bank. After each item is admin- 
istered, an IRT score is reestimated to choose and apply 
the next best suited item for the current score estimate. 
CATs generally use two different ways to end the assess- 
ment ("stopping rules"): the CAT either stops after a 
predefined measurement precision (confidence interval) 
has been achieved or after a predefined total number 
of items have been administered. These stopping rules 
can also be combined or can be flexible with respect to 
the measurement range, eg, in a particular measurement 
range, a higher precision can be demanded than in other 
ranges. By omitting irrelevant, uninf ormative items, high- 
er measurement precision is achieved while at the same 
time, respondent burden can be reduced.^^ '*^"^ 

In 2004, the US National Institutes of Health (NIH) 
initiated a large project as part of their roadmap initia- 
tives to address the need to develop a comprehensive 
Patient Reported Outcomes Measurement Information 
System (PROMIS, www.nihpromis.org). The initiative 
was launched nationwide to systematically develop a 
new generation of PRO tools applying IRT methods and 
CATs. As of today, the PROMIS initiative is the most 
well financed effort to improve the assessment of PROs, 
and is currently adopted in other languages. It will very 
likely influence mental health monitoring in the future, 
including the assessment of anxiety substantially.^' "^ "*'*' 

Daily reconstruction methods and ecological 
momentary assessments 

Whereas the use of IRT and CAT methods have been 
intensely facilitated by the US NIH, the Food and Drug 
Administration (FDA)'" has shown increasing inter- 
est in the use of Ecological Momentary Assessments 
(EM A) or Day Reconstruction Methods (DRM)."'^ 
These methods address the problem of potential recall 
bias. It has been questioned as to what a tool measures 
when a patient is asked to report on his/her mental 
state over a time span of one or more weeks, ie, to what 
extent does the current mental health state distort the 
evaluation of the recall assessment. This may be par- 
ticularly important for the assessment of emotional 
distress constructs like anxiety.'""' DRM and EMA 
typically use some kind of electronic data-capturing 
device (eg, a smartphone) to assess a patient's health 



status under daily life conditions. The patient is asked 
at different planned or randomly selected intervals to 
report their current health status. All collected time 
points are later integrated for a comprehensive picture 
of the health status over a given time span. Although 
the use of EMA or DRM is infrequent until now, this 
may become a relevant upcoming technology for PRO 
assessments outside of clinical environments. 

Self-assessment instruments of 
anxiety symptoms 

Searching PubMed and PubPsych using the keywords 
"anxiety questionnaire/survey/test/scale" in titles, re- 
vealed more than 1000 publications since the 1950s. 
The first anxiety questionnaires listed were developed 
by Cattell and Scheier'' and Taylor,'* both personality 
psychologists, who conceptualized anxiety as a trait. 
These anxiety measures were popular for many years 
after their development, but are now used infrequently. 

Since the 1950s numerous anxiety questionnaires 
were built, first using factor analytic methods, then ex- 
tending to the full range of methods used within the CTT 
framework, up to today's more often used IRT methods. 
A recent review of existing instruments shows that 145 
different scales measuring anxiety are used today." 

Commonly used tools 

Contemporary anxiety instruments can be categorized 
into two groups: 

a) generic tools, which aim to measure the common as- 
pects of different anxiety disorders 

b) specific tools, which aim to assess anxiety in response 
to particular situations in the medical field (eg, den- 
tal anxiety" or cancer anxiety^" *^), as well as outside 
the medical field (eg, test anxiety,'' computer anxi- 
ety,*" or dating anxiety") 

For both areas well-validated tools are available, 
which can be used for screening purposes, or outcome 
assessments, or both (Tables I and II). 

Generic anxiety measures 

One prominent example of a generic instrument to 
monitor anxiety symptoms is the State Trait Anxiety 
Inventory (STAI).** Its items were primarily developed 
to optimize the psychometric properties of the tool. 
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Instrument 


Domains 


Number of 
items 


Recall 
period 


Time to 
complete 


Psychometric properties 


>1000 citations 












State Trait Anxiety 
Inventory (STAI)'^'^"" 


Anxiety (state and 
trait) 


20 state 
and 20 trait 
items 


Currently 
and 

generally 


4-8 min 
per scale 


Good psychometric properties (internal 
consistency, 0.86-0.95; retest reliability, 0.65- 
0.89; proven validity: sensitivity, 0.82; specifi- 
city, 0.88), short versions. Norm data available. 


Hospital Anxiety 
and Depression 
Scale (HADS)"!' 


Anxiety and depression 


7 anxiety 
and 7 
depression 
items 


Past week 


2-5 min 


Good psychometric properties (internal 
consistency, 0.76-0.80; retest reliability 
0.70; sensitivity and specificity for anxiety 
disorders, 0.85). Norm data available. 



100-1000 citations 



Generalized Anxiety 
Disorder-7 (GAD- 


Anxiety (items reflect 
DSM-IV criteria for 
GAD) 


7 anxiety 
items 


Over the 
last 2 
weeks 


5 min 


Good psychometric properties (internal 
consistency, 0.89; good reliability and 
convergent validity: sensitivity, 0.80; speci- 
ficity, 0.86). Norm data available. 


Beck Anxiety Inven- 
tory (BAI)i^ 


Anxiety (cognitive and 
somatic components) 


21 anxiety 
items 


Last 
week 


5-10 min 


Good psychometric properties (internal 
consistency, >0.9; retest reliability, 0.6-0.9; 
correlation to HAMRS, 0.5; STAI, 0.5; BDI-II, 
r=0.66; SCL-90, 0.8; responsive: sensitivity, 
0.67, specificity, 0.93). Norm data available. 


Zung Anxiety 
Scale"' 


Anxiety (cognitive, 
autonomic, motor, 
central nervous system 
symptoms) 


20 anxiety 
items 


During 
the past 
several 
days 


10-15 min 


Moderate psychometric properties (inter- 
nal consistency, 0.74-0.77) discriminates 
well between patients diagnosed with 
and without anxiety disorders; correlation 
between Zung and BDI, 0.59. 


<100 citations 


Mood and Anxiety 
Symptoms Question. 
(MASQ)^""-'" 


Tripartite model (gene- 
ral distress, anhedonia, 
hyperarousal) 


90 items 
(short form: 
30 items) 


Past 
week 


10 min 

short 

version: 


Good psychometric properties (good 
internal consistency, >0.87; good validity). 
Norm data available. 


Patient Health 

Questionnaire-4 

(PHQ-4)^''" 


Anxiety and depression 


2 anxiety 
and 2 
depression 
items 


Last 2 
weeks 


2 min 


Moderate psychometric properties for 
anxiety scale (internal consistency, 0.75; 
sensitivity, 0.86, specificity, 0.70). Norm 
data available. 



Penn State Worry 
Question^*"^ 


Worries 


16 worry 
items 


Current 


10-15 min 


Good psychometric properties (high inter- 
nal consistency; good test-retest reliabi- 
lity; good discriminant validity GAD). 


Anxiety Screening 
Question. (ASQ- 

15)113 


Anxiety disorders 
(panic disorders, and 
GAD) 


1 5 anxiety 
items 


Last 
week 


10 min 


Good psychometric properties (retest 
reliability, 0.6; sensitivity, >0.82; specificity, 
>0.70 for GAD). 


Anxiety Disorder 
Diagnostic Ques- 
tion. (ADDQ)"" 


Fear, anxiety/worry, 
escape/avoidance be- 
haviors, physiological, 
and distress symptoms, 
interference 


8 anxiety 
questions, 
1 symptom 
list, 3 open 
questions 


Current 


10 min 


Good psychometric quality (good internal 
consistency; convergent and discriminant 
validity; sensitive to change). 



Table I. Examples of generic anxiety questionnaires. Further examples of generic anxiety scales are: Cattell and Scheier's Anxiety Scale," "^ Taylor 
Manifest Anxiety Scale,^^ Worry and Anxiety Questionnaire,"'' Lehrer Woolfolk Anxiety Symptom Questionnaire (LWASQ),"' Four Systems 
Anxiety Questionnaire,"' Worries and emotionality Scale,"' Most anxiety questionnaires have been built based on principles of the clas- 
sical test theory (CTT), however some anxiety tests have also been reanalyzed using modern item response theory (IRT) methods.'" '^" "^ 
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Strengths 


Weaknesses 


Screen- 
ing 


Royal- 
ty 
free 


Obtainable 


Among the most widely researched and 
used measures, offered in 48 languages. 
State scale is sensitive to the detection of 
longitudinal change. 


Relatively long instrument, 
high correlations between 
state- and trait-scale. 


X 




Copyright: Mind Garden, 855 
Oak Grove Avenue, Suite 215, 
Menio Park, CA 94025, www. 
mindgarden.com/index.htm 


Very widely used screening measure, offe- 
red in various languages, short screener to 
detect the presence of clinically significant 
symptoms covering tension, worry, fear, 
panic, difficulty in relaxing, and restlessness. 


Some evidence of reduced 
validity in the elderly. 


X 




Copyright: Nfer Nelson, The 
Chiswick Centre, 414 Chiswick 
High Road, London, W4 5TF, UK, 
www.nfer-nelson.co.uk 




Offered in different languages, cut-off 
scores for GAD available. 


Screening for GAD diagnosis 
only. 


X 


X 


www.phqscreeners.com 


Developed to minimizes the overlap 
between depression and anxiety scales, 
youth-specific BAI available. 


Focus on somatic aspects 
(eg, heart racing, dizziness) 
may overrate anxiety in 
medical conditions. 


X 




Copyright: Pearson Assessment 
www.pearsonassessments.com 


Frequently replicated psychometric results. 


Several Zung items have 
higher correlations with 
the BDI than with the total 
Zung score. 




X 


www.psychology-tools.com/ 
zung-anxiety-scale/ 




Strong theoretical model. 


Long version does not fit 
the 3 factor model very 
well. 




X 


Contact author 


Ultra-short screener for depression and 
anxiety. 


PHQ-2 and PHQ-9 for 
depression measurement 
PHQ-4 is not well accepted 
yet, however the 2 depres- 
sion items, part of the PHQ- 
4, are widely used (PHQ-2). 


X 


X 


Available for free in multiple 
languages from www.phqscree- 
ners.com 


Detailed assessment of worries. 


Restricted to worries. 




X 


www.outcometracker.org/library/ 
PSWQ.pdf 


Tested against a standardized clinical inter- 
view (CIDI). 


Specificity is only sufficient 
for DSM-IV GAD. 


X 




Contact author 


Brief four-section index. 


Not reported. 


X 


X 


www.midss.org/sites/default/ 
files/addq.pdf 



(Continued) Anxiety scales are often combined with measuring depression, for a more extensive overview of 34 tests, measuring anxiety and 
depression combined see reference 1 22; an example of a frequently used clinician rating scale is the Hamilton Anxiety Scale (HAM-A; http:// 
www.psychiatrictimes.com/clinical-scales-anxiety/ham-hamilton-anxiety-scale).'" GAD, General Anxiety Disorder; HADS, Hospital Anxiety 
and Depression Scale; PHQ, Patient Health Questionnaire; PRQMIS, Patient-Reported Outcomes Measurement Information System 
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Twenty items assess "how you feel right now, at this mo- 
ment," and twenty items measure "how you generally 
feel," with the intention to differentiate between anxi- 
ety states and traits. Both scales have a good internal 
constancy with a Cronbach a >0.90. However, as the 
Cronbach a value is dependent on the number of items 
of the scale, long scales usually have high Cronbach a 
values. Both STAI scales show a high correlation to de- 
pression scales, as the majority of items capture a "neg- 
ative mood" or "general distress" aspect (eg, feeling 
tense, upset, frightened, indecisive, strained, etc) which 
is also assessed by typical depression scales. 

An example of a shorter more recently published tool 
with a clinical background is the The item set 

of this instrument was primarily developed to capture the 
anxiety construct as defined by the £)5M-7y. Although the 
GAD-7 only uses one third the number of items as the 
STAI, its psychometric properties are almost as good. 

Some tests, like the Hospital Anxiety and Depres- 
sion Scale (HADS), provide one scale for anxiety and 
one for depression in the same instrument to address 
the issue of content overlap. The HADS provides good 
psychometric criteria for both scales,^' but how distinct 
both scales are is influenced by the populations being 
studied. The PHQ-4 is an example of an ultra-short 
screening tool, which also allows for assessing anxiety 
and depression by one tool.*^ However, the PHQ-4 with 
two items per scale is less precise, useful in large epide- 
miological studies, but not well suited for smaller stud- 
ies or individual clinical decision making. 



The BAI is a scale, which aims to measure aspects of 
anxiety that are most distinct from the depression con- 
struct. Thus, the BAI focuses more on the evaluation of 
"hyperarousal" (eg, heart pounding/racing, hands trem- 
bling) targeting more the physical symptoms of anxiety. 
Thus, the BAI is prone to be used to assess panic, pho- 
bic, or PTSD disorders. 

A few instruments, like the Anxiety Screening 
Questionnaire-15 (ASQ-15) or GAD-7, were primar- 
ily constructed to screen for anxiety disorders. As we 
previously stated, self-assessment tools used to identify 
anxiety disorders are often show less favorable psycho- 
metric characteristics than screening tools for depres- 
sive disorders. Nevertheless, if screening tools are used 
carefully, they can still provide valuable information 
for the primary care provider. Tools like the GAD-7, 
which are primarily developed for screening purposes, 
are also valuable, responsive "outcome" tools. Further- 
more, for many "outcome" measures, like the HADS or 
STAI, cut-off scores have been established to also allow 
screening for anxiety disorders, with good sensitivity 
and specificity (Table I). 

Specific anxiety measures 

Most instruments assessing specific anxiety disorders 
evaluate some kind of social phobia, like the Social 
Phobia Inventory (SPIN) or the Social Anxiety Ques- 
tionnaire (SAQ-A30). However, for a huge number of 
other specific phobias validated instruments exist, such 



Instrument Domains Number of Recall Time to Psychometric properties 

items period complete 



<100 citations 



Social Interaction 
Scale (SIAS) and 
Social Phobia Scale 

^5p5J83,124-126 


Two companion measures 
for social phobia fears: 
fear of being scrutinized 
and fear of general social 
interaction. 


20 items 
per scale 


Current 


10-25 min 


High internal consistency and retest 
reliability, good convergent and discrimi- 
nant validity, sensitive to change. 


Social Phobia INven- 
tory (SPIN)i" i^^ 


Main spectrum of social 
phobia such as fear, avoi- 
dance, and physiological 
symptoms. 


17 items 


During 
the past 
week 


10-15 min 


Good internal consistency, retest reliabi- 
lity, convergent and divergent validity, 
sensitive to change, cut-off scores avai- 
lable. 


Social Anxiety Ques- 
tion (SAQ-A30)'" 


Social phobia/anxiety struc- 
tured in five dimensions. 


30 items 


Current 


15-20 min 


Well-proven factor structure, good inter- 
nal consistency, construct validity, cultu- 
ral invariance, cut-off scores available. 



Table II. Examples of specific anxiety questionnaires. An example of a frequently used clinician rating scale is the Liebowitz Social Anxiety Scale 
(LSAS; http://healthnet.umassmed.edu/mhealth/LiebowitzSocialAnxietyScale.pdf).'™''^' BDI, Beck Depression Inventory; DSM-IV, Diag- 
nostic Statistical Manual of Mental Disorders IV; HAMRS, Hamilton Depression Rating Scale 
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as the Dental Anxiety Questionnaire,^' Anxiety about 
Death Questionnaire,*^''' Cardiac Anxiety Question- 
naire,'" Burn Specific Anxiety Scale,'^-'^ Preoperative 
Anxiety Questionnaire,'^-''* Anxiety and Preoccupation 
of Sleep Questionnaire,'"' Prostate Cancer-specific Anx- 
iety Scale,"' Radiotherapy Categorical Anxiety Scale,'^ 
Dyspnea-Related Anxiety,"'^ CQPD Anxiety Ques- 
tionnaire," Pain Anxiety Questionnaire(s),*° Glasgow 
Anxiety Scale for People with an Intellectual Disabil- 
ity,^^ Florida Shock Anxiety Scale,^^ Faces Anxiety Scale 
for Intensive Care Patients,^^ Qptometric Patient Anxi- 
ety Scale,^" Pregnancy Anxiety Scale,^' or Psychotic 
Anxiety Scale.^'' 

Examples of specific anxiety scales used outside of 
the medical field include the Flight Anxiety Situations 
Questionnaire,^' Separation Anxiety Symptoms Inven- 
tory,** Anxiety Control Questionnaire,*' A/D Goldberg 
Questionnaire: Anxiety and Depression at Work,'" Test 
Anxiety Questionnaire,'^ Mathematics Anxiety Scale,'^ 
Statistical Anxiety Scale,'^ Job Anxiety Scale,'^ Social 
Physique Anxiety Scale,'" Equine Anxiety Question- 
naire,'' Dating Anxiety Scale,*' Anxiety Scale for Music 
Students," and Computer Anxiety Scale,*" among many 
others. 



prove self-reported anxiety measures and to provide a 
common metric for many existing tools.^^*^ 

The first IRT item bank for anxiety was published 
a decade ago in Germany. It showed that fourteen dif- 
ferent tools provide shared information which can be 
scored on one common metric. This IRT item bank was 
used to build the first CAT for the assessment of anxiety 
(A-CAT),"" which has improved psychometric char- 
acteristics compared with established tools of similar 
length, and has been implemented in clinical practice 
ever since.*"" 

Qther groups have recently published a CAT for 
Anxiety in the US (CAT-ANX).*"! Probably the most 
advanced IRT item bank today is provided by the 
PRQMIS initiative." After intensive qualitative work 
examining established static anxiety tests of different 
lengths a new large anxiety item bank was built dur- 
ing an extensive quantitative development process. 
The PRQMIS anxiety item bank and CAT are publicly 
available, royalty free, at the Assessment Center (www. 
assessmentcenter.net/). Simulation studies show very 
favorable results*' and results from real CAT applica- 
tions are expected soon. For an overview of available 
CATs for anxiety see Table III. 



New measures 



Choosing the "right" instrument 



In addition to the numerous established tools, a few 
new instruments developed using IRT methods have 
emerged, which have the potential to significantly im- 



An essential question is how to choose the best anxiety 
instrument out of the large set of available tools. Qbvi- 
ously there is no best instrument for all research ques- 



Strengths 


Weaknesses 


Screening 


Royal- 
ty free 


Obtainable 


Easy to score. 


Restricted to social pho- 
bia diagnosis only. 


X 


X 


www.academia.edu/ 


Mini-SPIN with only 3 questions available. 


Restricted to social pho- 
bia diagnosis only. 


X 


X 


http://psychology-tools. 
com/spin/ 


Test developed based on several years of work by 
the research team in 18 Latin-American countries, 
Spain, and Portugal. Cross-culturally tested. 


Restricted to social pho- 
bia diagnosis only. 


X 


X 


www. midss.org/sites/de- 
fault/files/saq-a30 english. 
pdf 



Table II. Continued 
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Instrument 
<100 citations 


Domains 




Number of items 


Recall 
period 


Time to complete 


A rAT97-99 


Anxiety (unidimensional 
bipolar). 


2 348 clinical patients 
with/without different 
anxiety disorders 


i+Qmc ^-fiiii l-t3l^L'^ 
ju ixerris ^tuii uariK^ 

6 items (CAT) 


cu rrent 
up to 4 
weeks 


1 . / ± I.I nil ri witn 
precision based 
stopping rule 


CAT-ANX'"! 


Anxiety (multi-dimen- 
sional: mood, cognition, 
behavior, somatization) 


1 614 clinical patients 
with/without GAD 


431 items (full 
bank) 12 items 
(CAT) 


2 weeks 


2.5 ± 1.6 min 


CAT of the Mood 
and Anxiety Spec- 
trum Scales"^ 


Anxiety and depression 
(bi-factor model general 
factor: anxiety, somatic 
complaints). 


800 clinical patients 


626 items (full 
bank), 24 items 
(CAT) 


Not 
repor- 
ted 


Not reported 


PROMIS Anxiety- 
CAT" 


Anxiety (unidimensional 
unipolar). 


>15 000 mainly general 
population internet 
sample 


29 items (full 
bank), number of 
CAT items vary 


7 days 


Not reported 



Table III. Available computerized adaptive tests (CATs) for anxiety. CES-D HADS, Hospital Anxiety and Depression Scale; MASQ, Mood and Anxiety 
Symptom Questionnaire; STAI, State-Trait Anxiety Inventory 



tions, however, the following considerations may guide 
one's decision: 

• In our opinion the main concern should be whether 
the content of the items captures the content of in- 
terest. One should be cautious about the title of the 
instrument or the scales, as those sometimes misrep- 
resent the items. 

• A second thought should be given to the relation 
between measurement precision and respondent 
burden. In general, small sample sizes (or the use of 
anxiety measures in clinical practice) require higher 
measurement precision than larger sample sizes, eg, 
for epidemiological studies. Essentially, higher meas- 
urement precision requires more items and leads to 
higher respondent burden if static tests are used. In 
most cases, a compromise between precision and re- 
spondent burden needs to be found. Individually tai- 
lored, dynamic tools (CATs) can provide shorter and 
more precise measures, but their practical advantages 
need to be demonstrated. 

• Third, measurement precision should be considered 
in relation to measurement range and the distribu- 
tion of the sample being studied. Pilot studies are 
useful to identify floor or ceiling effects (a high 
proportion of respondents at the scale extremes), 
which compromise the interpretation of study re- 
sults. Traditional psychometric characteristics, like 



Cronbach's a, do not provide information in this 
respect, as the precision of an instrument is de- 
pendent on the measurement range. Thus, we find 
the term "reliable range" more useful. Modern psy- 
chometric methods make the relationship between 
measurement range and precision more transpar- 
ent, and may provide a rational guide to prefer one 
tool above the other (see Figure i; for more details 
see ref 102). 

• Last but not least, any tools considered should have 
a manual or an article describing the development 
sample, the psychometric model and methods used 
for development, as well as information about psy- 
chometric properties of the test, including reliabil- 
ity and validity results. Norm samples of the test 
are useful for interpretation and to compare results 
across studies. 
Many self-assessment tools can be found online, to- 
gether with test descriptions (eg, www.proqolid.org or 
www.psychometrikon.de). 

Modes of assessment 

As important as the instrument being considered, in 
our opinion, is the mode of assessment for a success- 
ful use of self-assessment tools. In general the require- 
ments for the use of PRO tools differ between clinical 
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Psychometric properties 


Strengths 


Weaknesses Screening 


Royal- 
ty free 


Obtainable 


f /am anri nicfiri 
^_UI IVtri y CI 1 L dllU UlbLII 

minant validity shown: 
correlation with legacy 
tools: 0.60 STAI, 0.66 
HADS (anxiety scale). 


nibL \_MI UUML LU [llcrdbUlt; 

anxiety, integrated into clinical 
routines for a decade. 


II ILUI IslbLcl 1 L IcrLdii 

periods as it combines 
items from different 
existing measures for 
a common metric. 


X 


^-UltLdLL dULIIUlb 


CAT was strongly related 
to GAD diagnosis: sensiti- 
vity, 0.65; specificity, 0.93; 
CAT-DI /CAT-ANX: 0.82. 


Based on review of 100 depres- 
sion and anxiety scales. 


Low sensitivity for x 
GAD long CAT. 




Contact authors 


Correlation to legacy 
tools not reported. 


Hierarchical bifactor model. 


Longest CAT. 




Contact authors 


correlation to MASQ, 
0.80; CES-D, 0.75. 


Based on review of 145 anxiety 
scales, extensive state-of-the- 
art qualitative and quantitative 
item bank development. 


High correlation 
between anxiety and 
depression PROMIS 
item banks, 0.81. 


X 


www.assess- 

mentcenter.net/ 

www.nihpromis.org 



Table I 



Continued 



practice and research. For clinical practice settings, an 
individual report must be provided without delay to be 
useful for clinical decision making. For clinical research 
aggregated reports or data banks are usually sufficient 
for a study. 

We have developed and used different electronic- 
PRO (ePRO) systems at our clinic for almost two dec- 
ades and we believe that only electronic data collection 
methods meet the clinical standards in busy routine 
care. In addition ePRO systems are less expensive than 
paper-pencil assessments, as they reduce staff time for 
administration, scoring, and report preparation sub- 
stantially, 

For clinical research purposes this decision is more 
complex. First of all, it depends if an electronic data 
capturing system is already available, or an open source 
(eg, www.limesurvey.org) or a commercial system (eg, 
www.unipark.de) can be used, which meets data protec- 
tion requirements. Second, electronic assessments are 
often favored as they reduce missing data and avoid 
false data entries, but if sample sizes are small, paper- 
pencil assessments may be more convenient and less 
expensive than programming an assessment. However, 
other differences between electronic and paper-pencil 
assessment may be considered as well, and have been 
discussed elsewhere in detail.^"* From a psychometric 



perspective many modes of assessment can be used 
interchangeably, including paper-pencil assessment, 
computer assessments on desktop computers or smart- 
phones, as well as interactive voice-recognition (IVR), 
whereas telephone interviews are typically more biased 
by social desirability.^"^ 

Outlook 

The empirical assessment of anxiety and other mental 
health symptoms is essential for evidence-based medi- 
cine. There is a plethora of instruments available, which 
can provide valid and reliable assessments of anxiety. 
However, the measurement of psychological constructs 
is not as established as the measurement of biomarkers. 
One reason for this may be that all instruments provide 
different scores, making intuitive interpretation and 
communication more difficult. 

For a greater acceptance in the medical field, be- 
yond mental health care, we believe we need to move 
away from an instrument-defined measurement and 
towards a construct-defined measurement system. New 
developments, such as IRT and CATs, make different 
instruments easily comparable by a standardized com- 
mon metric, like different thermometers measuring on 
the same temperature scale. □ 
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Evaluacion de los sintomas de ansiedad percibidos 
por el paciente 

Los sintomas auto-informados por el paciente son de cru- 
cial importancia para identificar los trastornos de ansie- 
dad, como tambien para monitorizar su tratamiento en 
la practica clinica y en investigacion. Por lo tanto, para la 
medicina basada en la evidencia esta justificada la eva- 
luacion precisa, confiable y valida (es decir, "objetiva") 
de los sintomas "subjetivos" percibidos por el paciente. 
Existe una gran cantidad de instrumentos disponibles 
que pueden proporcionar evaluaciones psicometricas 
fidedignas de la ansiedad, pero hay algunas limitaciones 
de las herramientas actuales que necesitan ser conside- 
radas con cuidado para un empleo exitoso. Sin embar- 
go, en medicina, la evaluacion empirica del estado de 
salud mental no es tan aceptada como ocurre con los 
biomarcadores. Una razon para esto puede ser que los 
diferentes instrumentos que evaluan el mismo construe- 
to psicologico utilizan distintas escalas. En este articulo 
se presentan algunos nuevos desarrollos que prometen 
proporcionar una metrica comun para la evaluacion de 
la ansiedad, y asi facilitar a futuro la aceptacion general 
de las evaluaciones en salud mental. 



Evaluation des symptomes anxieux rapportes par 
les patients 

Les symptomes rapportes par les patients sont tres im- 
portants pour identifier les troubles anxieux et surveiller 
leur traitement en recherche et en pratique cliniques. 
Une evaluation precise, fiable et valable (c'est-a-dire 
« objective ») des symptomes « subjectifs » rapportes 
par le patient est done justifiee pour la medecine ba- 
see sur les preuves. Les instruments disponibles sont 
tres nombreux, capables d'evaluer I'anxiete de faqon 
psychometrique mais certaines limites d'outils actuels 
meritent d'etre soigneusement analysees pour une uti- 
lisation fructueuse. L'evaluation empirique de I'etat de 
sante mentale n'est neanmoins pas aussi bien acceptee 
en medecine que l'evaluation des biomarqueurs, ce qui 
peut s'expliquer par I'utilisation d'instruments diffe- 
rents pour evaluer la meme construction psychologique 
a I'aide d'echelles differentes. Nous presentons dans cet 
article de nouvelles avancees promettant la conception 
d'une mesure commune d'evaluation de I'anxiete afin 
de faciliter a I'avenir I'acceptation generale de l'evalua- 
tion de la sante mentale. 
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